Národní úložiště šedé literatury Nalezeno 1 záznamů.  Hledání trvalo 0.00 vteřin. 
Automatic Webpage Content Categorisation and Extraction
Rein, Michal ; Koutenský, Michal (oponent) ; Dolejška, Daniel (vedoucí práce)
This thesis describes the development of a flexible system for automatically categorising and extracting content from web pages, with a focus on the darknet environment. We have designed a highly adaptable and scalable system capable of handling any type of content, while taking great care in considering the overall architecture, database structure, and processing pipeline. Using the state-of-the-art language model trained on the natural language inference task, we demonstrate the model's potential to categorise content effectively in a zero-shot environment. We also conduct an analysis of the performance of various hypothesis templates. To further enhance the data extraction process, we have integrated a named entity recognition model and templating methodology for content extraction and proposed an automated segmentation approach using OpenAI's ChatGPT model. In addition, we have developed a user-friendly web client application to enhance the system's accessibility and ease-of-use, evaluated the achieved results, and identified areas for further research and development in this field.

Chcete být upozorněni, pokud se objeví nové záznamy odpovídající tomuto dotazu?
Přihlásit se k odběru RSS.